Word Associations for Retrieving Web Documents

نویسندگان

  • Mike Symonds
  • Guido Zuccon
  • Bevan Koopman
  • Peter Bruza
چکیده

Many existing information retrieval models do not explicitly take into account information about word associations. Our approach makes use of first and second order relationships found in natural language, known as syntagmatic and paradigmatic associations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically significant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associations can assist with improving retrieval effectiveness on ad hoc retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QUT_Para at TREC 2012 Web Track: Word Associations for Retrieving Web Documents

Many existing information retrieval models do not explicitly take into account information about word associations. Our approach makes use of first and second order relationships found in natural language, known as syntagmatic and paradigmatic associations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approac...

متن کامل

Open-vocabulary spoken-document retrieval based on query expansion using related web documents

This paper proposes a new method for open-vocabulary spoken-document retrieval based on query expansion using related Web documents. A large vocabulary continuous speech recognition (LVCSR) system first transcribes spoken documents into word sequences, which are then segmented into semantically cohesive units (i.e., stories) using a text segmentation technique. Given a text query word, Web docu...

متن کامل

An improved Approach for Document Retrieval Using Suffix Trees

Huge collection of documents is available at few mouse clicks. The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. In our work we build a generalized suffix tree for our documents a...

متن کامل

LSI meets TREC: A Status Report

Latent Semantic Indexing (LSI) is an extension of the vector retrieval method (e.g., Salton & McGill, 1983) in which the dependencies between terms and between documents, in addition to the associations between terms and documents, are explicitly taken into account. This is done by simultaneously modeling all the association of terms and documents. We assume that there is some underlying or "la...

متن کامل

Extracting Concepts' Relations and Users' Preferences for Personalizing Query Disambiguation

Nowadays, Web search engines play a key role in retrieving information from the Internet to provide useful Web documents in response to users’ queries. The keywords-based search engines, like GOOGLE, YAHOO Search and MSN Live Search, explore documents by matching keywords in queries with words in documents. However, some keywords have more than one meaning, and such words may be related to diff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014